.. image:: images/logo.png ------------------------------------- DSZI models ''''''''''' What are DSZI models? ===================== DSZI is an acronym for "Defective Subpopulation Zero Inflated". It is a combination of the Defective Subpopulation (DS) model and the Zero Inflated (ZI) model. A defective subpopulation model is where the CDF does not reach 1 during the period of observation. This is caused when a portion of the population fails (known as the defective subpopulation) but the remainder of the population does not fail (and is right censored) by the end of the observation period. A zero inflated model is where the CDF starts above 0 at the start of the observation period. This is caused by many "dead-on-arrival" items from the population, represented by failure times of 0. This is not the same as left censored data since left censored is when the failures occurred between 0 and the observation time. In the zero inflated model, the observation time is considered to start at 0 so the failure times are 0. In a DSZI model, the CDF (which normally goes from 0 to 1) goes from above 0 to below 1, as shown in the image below. In this image the scale of the PDF and CDF are normalized so they can both be viewed together. In reality the CDF is much larger than the PDF. .. image:: images/DSZI_explained.png A DSZI model may be applied to any distribution (Weibull, Normal, Lognormal, etc.) using the transformations explained in the next section. The plot below shows how a Weibull distribution can become a DS_Weibull, ZI_Weibull and DSZI_Weibull. Note that the PDF of the DS, ZI, and DSZI models appears smaller than that of the original Weibull model since the area under the PDF is no longer 1. This is because the CDF does not range from 0 to 1. .. image:: images/DSZI_combined.png Equations of DSZI models ======================== A DSZI Model adds a minor modification to the PDF and CDF of any standard distribution (referred to here as the "base distribution") to transform it into a DSZI Model. The transformations are as follows: :math:`PDF_{DSZI} = PDF_{base} × (DS-ZI)` :math:`CDF_{DSZI} = CDF_{base} × (DS-ZI) + ZI` In the above equations the base distribution (represented by :math:`PDF_{base}` and :math:`CDF_{base}`) is transformed using the parameters DS and ZI. DS is the maximum of the CDF which represents the fraction of the total population that is defective (the defective subpopulation). ZI is the minimum of the CDF which represents the fraction of the total population that failed at t=0 or equivalently were “dead-on-arrival” (the zero inflated fraction). To create only a DS model we can set ZI as 0. To create only a ZI model we can set DS as 1. The parameters DS and ZI must be between 0 and 1, and DS must be greater than ZI. The above equations can be expanded depending on the equation of the base distribution. For example, if the base distribution is a two parameter Weibull distribution, the DSZI model would be: :math:`\text{PDF:} \hspace{11mm} f(t) = \frac{\beta}{\alpha}\left(\frac{t}{\alpha}\right)^{(\beta-1)}{\rm e}^{-(\frac{t}{\alpha })^ \beta } \left(DS - ZI \right)` :math:`\text{CDF:} \hspace{10mm} F(t) = \left(1 - {\rm e}^{-(\frac{t}{\alpha })^ \beta }\right) \left(DS - ZI \right) + ZI` The SF, HF and CHF can be obtained using transformations from the CDF and PDF using the `relationships between the five functions `_. Creating a DSZI model ===================== Within reliability, the DSZI Model is available within the Distributions module. The input requires the base distribution to be specified using a distribution object and the DS and ZI parameters to be specified if required. DS defaults to 1 and ZI defaults to 0. The output API matches the API for the standard distributions. .. admonition:: API Reference For inputs and outputs see the `API reference `_. Example 1 --------- In this first example, we will create a Gamma DSZI model and plot the 5 functions. .. code:: python from reliability.Distributions import Gamma_Distribution, DSZI_Model model = DSZI_Model(distribution = Gamma_Distribution(alpha=50,beta=2), DS= 0.8, ZI=0.3) model.plot() .. image:: images/DSZI_example1.png Example 2 --------- In this second example, we will create a Lognormal_DS model, draw some random samples and plot those samples on the survival function plot. .. code:: python from reliability.Distributions import Lognormal_Distribution, DSZI_Model from reliability.Probability_plotting import plot_points import matplotlib.pyplot as plt model = DSZI_Model(distribution = Lognormal_Distribution(mu=2,sigma=0.5), DS= 0.75) failures, right_censored = model.random_samples(50,seed=7, right_censored_time = 50) model.SF() plot_points(failures = failures, right_censored = right_censored, func="SF") plt.show() .. image:: images/DSZI_example2.png Note that in the above example, the random_samples function returns failures and right_censored values. This differs from all other Distributions which only return failures. The reason for returning failures and right_censored data is that is is essential to have right_censored data in order to have a DS Model. Fitting a DSZI model ==================== .. admonition:: API Reference For inputs and outputs see the API reference for `Fit_Weibull_DS `_, `Fit_Weibull_ZI `_, and `Fit_Weibull_DSZI `_. As we saw above, the DSZI_Model can be either DS, ZI, or DSZI depending on the values of the DS and ZI parameters. Within the Fitters module, three functions are offered, one of each of these cases with the Weibull_2P distribution as the base distribution. The three Fitters available are Fit_Weibull_DS, Fit_Weibull_ZI, and Fit_Weibull_DSZI. If your data contains zeros then only the Fit_Weibull_ZI and Fit_Weibull_DSZI fitters are appropriate. Using anything else will cause the zeros to be automatically removed and a warning to be printed. Fit_Weibull_ZI does not mandate that the failures contain zeros, but if failures does not contain zeros then ZI will be 0 and the alpha and beta parameters will be equivalent to the results from Fit_Weibull_2P. Fit_Weibull_DS does not mandate that right_censored data is provided, but if right_censored data is not provided then DS will be 1 and the alpha and beta parameters will be equivalent to the results from Fit_Weibull_2P. Fit_Weibull_DSZI does not mandate that failures contain zeros or that right_censored data is provided. If right_censored data is not provided then DS will be 1. If failures does not contain zeros then ZI will be 0. If failures does not contain zeros and no right censored data is provided then DS will be 1, ZI will be 0 and the alpha and beta parameters will be equivalent to the results from Fit_Weibull_2P. Example 3 --------- In this example, we will create 70 samples of failure data from a Weibull Distribution, and append 30 zeros to it. We will then use Fit_Weibull_ZI to model the data. .. code:: python from reliability.Distributions import Weibull_Distribution from reliability.Fitters import Fit_Weibull_ZI from reliability.Probability_plotting import plot_points import numpy as np import matplotlib.pyplot as plt data = Weibull_Distribution(alpha=200, beta=5).random_samples(70, seed=1) zeros = np.zeros(30) failures = np.hstack([zeros, data]) plt.subplot(121) fit = Fit_Weibull_ZI(failures=failures) plt.subplot(122) fit.distribution.CDF() plot_points(failures=failures) plt.tight_layout() plt.show() ''' Results from Fit_Weibull_ZI (95% CI): Analysis method: Maximum Likelihood Estimation (MLE) Optimizer: TNC Failures / Right censored: 100/0 (0% right censored) Parameter Point Estimate Standard Error Lower CI Upper CI Alpha 192.931 5.33803 182.747 203.682 Beta 4.53177 0.431272 3.76064 5.46102 ZI 0.3 0.0458258 0.218403 0.396613 Goodness of fit Value Log-likelihood -426.504 AICc 859.259 BIC 866.824 AD 5.88831 ''' .. image:: images/DSZI_example3.png We can see above how the fitter correctly identified that the distribution was 30% zero inflated, and it did a reasonable job of finding the alpha and beta parameters of the base distribution. Example 4 --------- In this example, we will use Fit_Weibull_DS to model some data that is heavily right censored. The DS=0.4 parameter means that only 40% of the data is failure data, with the rest being right censored. The original distribution is overlayed in the plot for comparison of the goodness of fit. .. code:: python from reliability.Distributions import DSZI_Model, Weibull_Distribution from reliability.Fitters import Fit_Weibull_DS import matplotlib.pyplot as plt from reliability.Probability_plotting import plot_points model = DSZI_Model(distribution=Weibull_Distribution(alpha=70, beta=2.5), DS=0.4) failures, right_censored = model.random_samples(100, right_censored_time=120, seed=3) model.CDF(label="true model", xmax=300) fit_DS = Fit_Weibull_DS(failures=failures, right_censored=right_censored, show_probability_plot=False) fit_DS.distribution.CDF(label="fitted Weibull_DS", xmax=300) plot_points(failures=failures, right_censored=right_censored) plt.legend() plt.show() ''' Results from Fit_Weibull_DS (95% CI): Analysis method: Maximum Likelihood Estimation (MLE) Optimizer: TNC Failures / Right censored: 41/59 (59% right censored) Parameter Point Estimate Standard Error Lower CI Upper CI Alpha 67.9275 4.61424 59.4599 77.6009 Beta 2.63207 0.357826 2.0164 3.43571 DS 0.414739 0.0500682 0.321106 0.514964 Goodness of fit Value Log-likelihood -254.236 AICc 514.721 BIC 522.287 AD 374.746 ''' .. image:: images/DSZI_example4.png Example 5 --------- In this example, we will use some real world data from a vehicle manufacturer, which is available in the Datasets module. This example shows how the Weibull_2P model can be an inappropriate choice for a dataset that is heavily right censored. In addition the the visual proof provided by the probability plot (left) and the CDF (right), we can see the goodness of fit criterion indicate that Weibull_DS was much better (closer to zero) than Weibull_2P. .. code:: python from reliability.Fitters import Fit_Weibull_DS, Fit_Weibull_2P import matplotlib.pyplot as plt from reliability.Probability_plotting import plot_points from reliability.Datasets import defective_sample failures = defective_sample().failures right_censored = defective_sample().right_censored plt.subplot(121) fit_DS = Fit_Weibull_DS(failures=failures, right_censored=right_censored) print('-------------------------------------------') fit_2P = Fit_Weibull_2P(failures=failures, right_censored=right_censored) plt.subplot(122) fit_DS.distribution.CDF(label="fitted Weibull_DS",xmax=1000) fit_2P.distribution.CDF(label="fitted Weibull_2P",xmax=1000) plot_points(failures=failures, right_censored=right_censored) plt.ylim(0,0.25) plt.legend() plt.title('Cumulative Distribution Function') plt.suptitle('Comparison of Weibull_2P with Weibull_DS') plt.gcf().set_size_inches(12,6) plt.tight_layout() plt.show() ''' Results from Fit_Weibull_DS (95% CI): Analysis method: Maximum Likelihood Estimation (MLE) Optimizer: TNC Failures / Right censored: 1350/12295 (90.10627% right censored) Parameter Point Estimate Standard Error Lower CI Upper CI Alpha 170.983 4.61716 162.169 180.276 Beta 1.30109 0.0297713 1.24403 1.36077 DS 0.12482 0.00333709 0.118425 0.131509 Goodness of fit Value Log-likelihood -11977.7 AICc 23961.3 BIC 23983.9 AD 27212.4 ------------------------------------------- Results from Fit_Weibull_2P (95% CI): Analysis method: Maximum Likelihood Estimation (MLE) Optimizer: TNC Failures / Right censored: 1350/12295 (90.10627% right censored) Parameter Point Estimate Standard Error Lower CI Upper CI Alpha 10001.5 883.952 8410.7 11893.1 Beta 0.677348 0.016663 0.645463 0.710807 Goodness of fit Value Log-likelihood -12273.2 AICc 24550.3 BIC 24565.4 AD 27213 ''' .. image:: images/DSZI_example5.png Example 6 --------- In this example we will create a DSZI model with DS=0.7 and ZI=0.2. Based on these parameters, we expect the random samples to be around 70% failures and of those failures 20% of the total samples (failures + right censored) should be zeros due to the zero inflated fraction. We draw the random samples from the model and then fit a Weibull_DSZI model to the data. The result is surprisingly accurate showing DS=0.700005 and ZI=0.22, with the alpha and beta parameters closely resembling the parameters of the input Weibull Distribution. The plot below shows the CDF on the Weibull probability plot (left) and on linear axes (right) which each provide a different perspective of how the distribution models the failure points. .. code:: python from reliability.Distributions import DSZI_Model, Weibull_Distribution from reliability.Probability_plotting import plot_points import matplotlib.pyplot as plt from reliability.Fitters import Fit_Weibull_DSZI model = DSZI_Model(distribution=Weibull_Distribution(alpha=1200,beta=3),DS=0.7,ZI=0.2) failures, right_censored = model.random_samples(100,seed=5,right_censored_time=3000) plt.subplot(121) fit = Fit_Weibull_DSZI(failures=failures,right_censored=right_censored,label='fitted Weibull_DSZI') model.CDF(label='true model') plt.legend() plt.subplot(122) fit.distribution.CDF(label='fitted Weibull_DSZI') model.CDF(label='true model') plot_points(failures=failures,right_censored=right_censored) plt.legend() plt.tight_layout() plt.show() ''' Results from Fit_Weibull_DSZI (95% CI): Analysis method: Maximum Likelihood Estimation (MLE) Optimizer: TNC Failures / Right censored: 70/30 (30% right censored) Parameter Point Estimate Standard Error Lower CI Upper CI Alpha 1170.12 68.0933 1043.99 1311.49 Beta 2.60255 0.299069 2.07771 3.25997 DS 0.700005 0.045826 0.603391 0.781602 ZI 0.22 0.0414247 0.149465 0.311627 Goodness of fit Value Log-likelihood -463.613 AICc 935.647 BIC 945.646 AD 166.025 ''' .. image:: images/DSZI_example6.png The DSZI model is a model of my own making. It combines the well established DS and ZI models together for the first time to enable heavily right censored data to be modelled using a DS distribution while also allowing for zero inflation of the failures.